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n  h  ■  ABSTRACT 

Some  properties  of  the  sample  mean  and  sample  variance  of  a  long 
memory  process  are  described.  It  is  shown  that  for  a  particular  class  of  long 
memory  processes  the  asymptotic  relative,  efficiency  of  the  & -decimated 
sample  mean  xa(k)  (formed  by  taking  the  mean  of  every  Jfc-th  observation) 
with  respect  to  the  sample  mean  ^  of  all  n  observations  is  1,  but  that  the 
deficiency  (as  defined  by  Hodges  and  Lehmann)  of  ijca(k)  with  respect  to  3v  is 
infinite.  It  is  also  shown  that  the  sample  variance  of  a  long  memory, 
process  can  be  badly  biased  toward  0:  for  any  integer  N  and  every  €>0  there 
exists  a  long  memory  process  with  variahcft  o1  such  that  <  ^o2  for  all 
sample  sizes  n  £N.  These  properties  of  the  sample  mean  and  variance  are 
not  shared  by^ordinary"  stationary  processes  such  as  ARMA  processes. 
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1.  Introduction  and  Summary 

Let  {x,  :r=0,  ±1,  ±2,  ■■}  represent  a  wide-sense  (or  covariance)  stationary  process  that  is 
zero  mean,  real-valued,  discrete  time,  and  purely  continuous.  Let 


Cx(x)  —  EXfX,^ ,  1=0, ±1, ±2, 


represent  the  autocovariance  function  (a.c.f.)  of  the  process,  and  let  Sx  represent  its  spectral 
density  function  (s.d.f.)  defined  on  the  interval  [-rc,  7c].  In  the  classic  treatment  of  stationary 
processes  that  forms  the  basis  for  time  series  analysis,  it  is  usually  assumed  that  C*(t)  ->  0 
"fairly  rapidly"  as  x  -> »  or  that  Sx  is  bounded  on  [— tc,  7e].  In  particular,  the  regularity 
condition  XICx(t)|<«>  (which  implies  that  Sx  is  continuous  everywhere  and  hence 
bounded)  is  often  seen  as  a  hypothesis  in  various  theorems  (see,  for  example,  the  discussion 
in  section  1.3  of  Brillinger[5]).  This  condition  is  satisfied  by  many  useful  models  for  time 
series,  including  all  stationary  and  invertible  autoregressive,  moving  average  (ARMA) 
processes  with  a  finite  number  of  parameters  (see,  for  example,  Box  and  Jenkins[4]). 

In  spite  of  the  pervasiveness  of  these  regularity  conditions  in  time  series  analysis,  there 
are  many  time  series  in  nature  for  which  the  appropriateness  of  these  conditions  is 
questionable.  Empirically  such  series  have  two  characteristics:  the  observed  correlation 
between  data  points  that  are  x  units  apart  does  not  decrease  at  the  rapid  exponential  rate  of 
ARMA  models  as  x  increases;  and  estimates  of  the  s.d.f.  show  that  it  peaks  sharply  at  the 
origin.  Examples  include  the  difference  in  apparent  time  as  generated  by  two  different 
atomic  clocks[20];  fluctuations  in  the  earth’s  rate  of  rotation[17];  the  annual  minimum  flood 
levels  on  the  Nile  River[16];  density  fluctuations  of  sand  particles  passing  through  an  hour 
glass[23];  density  fluctuations  of  traffic  on  an  expressway[18];  pitch  fluctuations  in  Western 
music[25];  fluctuations  in  the  arrival  times  of  pulses  from  pulsars[7];  voltage  fluctuations 
across  cell  membranes [14];  deviations  in  resistance  of  a  56K  ohm  India  ink  resistor[26];  and 
a  wide  variety  of  economic  time  series  [9]. 

To  provide  models  these  many  time  series,  we  consider  stationary  processes  in  this 
report  such  that  £  I  C,(x)  |  =  «*>  or  that  Sx  has  a  singularity  at  the  origin.  Such  processes  have 
been  termed  long  memory  processes  (l.m.p.’s)  in  the  literature.  In  section  2  below,  we 
discuss  some  precise  definitions  of  a  l.m.p.  and  describe  an  important  example  of  a  l.m.p., 
namely,  the  power  law  l.m.p.  of  order  -1  <  a  <  0,  for  which 


lim 
<o  -»o 


5  (to)  . 


for  some  constant  h .  A  specific  example  of  a  power  law  l.m.p.  that  is  analytically  tractable  is 
the  fractional  difference  process  (f.d.p.)  introduced  by  Granger  and  Joyeux[10]. 

In  section  3,  we  show  that,  if  {x,}  is  a  power  law  l.m.p.  of  order  a  with  mean  \ix,  the 
sample  mean  xH  converges  a.s.  to  p,  and 


var  xn  =0(- 
n 


1 

l+a 


)• 


Next  we  show  that,  if  a  power  law  l.m.p.  has  a  s.d.f.  such  that  the  limits  S  (cat)  exist  for  all  co 
in  (— tc,  k),  then,  for  any  integer  k  £2,  its  k -decimated  sample  mean  (formed  by  taking  the 
mean  of  every  *-th  observation)  is  asymptotically  as  efficient  as  the  ordinary  sample  mean. 
As  a  converse,  we  prove  that,  if  a  stationary  process  has  a  summable  a.c.f.,  there  exists  a  k0 
such  that  the  asymptotic  relative  efficiency  of  the  k  -decimated  sample  mean  with  respect  to 
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the  ordinary  sample  mean  is  strictly  less  than  1  for  all  k  >.  k0.  We  next  show  that  for  a  f.d.p. 
the  deficiency  (as  defined  b}  lodges  and  Lehmann)  of  the  k  -decimated  sample  mean  with 
respect  to  the  sample  mean  is  infinite,  in  passing,  we  give  some  necessary  and  sufficient, 
conditions  on  the  a.c.f.  of  a  stationary  process  such  that 

var  yn  <  var  yn . 

In  section  4,  we  show  that  the  sample  variance  s?  of  a  f.d.p.  can  be  a  seriously  biased 
estimator  of  the  process  variance.  In  particular,  we  show  that,  for  every  sample  size  N  and 
every  e  >  0,  there  exists  a  long  memory  process  with  variance  o2  such  that  Es}  <  eo2  for  all 
n<,N .  In  practice,  this  result  means  that  the  process  variance  can  be  a  poor  measure  of 
variability  of  a  l.m.p. 

2.  Characterization  and  Models  for  Long  Memory  Processes 

The  processes  of  main  interest  in  this  report  are  such  that  £ I  C,(x)  I  =  accordingly, 
we  make  the  following 

Definition  1:  A  wide-sense  stationary  process  fa}  with  a  a.c.f.  Cx  is  called  a  long  memory 
process  (l.m.p.)  in  the  covariance  sense  if  £  |  Cx(x)  |  =  °«. 

The  qualifier  "in  the  covariance  sense"  is  needed  since  other  characterizations  of  a 
l.m.p.  are  possible  and  are  not  necessarily  equivalent  to  the  covariance  definition.  In 
addition  to  this  covariance  characterization,  Parzen[19]  proposes  definitions  in  terms  of  the 
s.d.f.  and  four  other  criteria.  For  what  follows,  a  characterization  in  terms  of  the  s.d.f.  is 
convenient,  so  we  first  review  under  what  conditions  1)  a  stationary  process  has  a  s.d.f.  and 
2)  a  given  function  is  a  s.d.f.  for  some  stationary  process. 

By  the  Wiener- Khinchin  theorem  (see,  for  example,  volume  1,  p.222  of  Priestley[22]), 
a  necessary  and  sufficient  condition  that  Cx  be  the  a.c.f.  for  some  stationary  process  fa}  is 
that  there  exists  a  function  Fx  (the  spectral  distribution  function)  defined  on  [-it, it]  such  that 
Fx(-n)  =  0;  0  <  Fx(k)  <  »»;  Fx  is  non-decreasing  on  [-«,*];  and 

Cz(x)  =  f  eiwtdFx((oi) . 

-X 

If  Fx  is  absolutely  continuous  with  respect  to  Lebesgue  measure,  then 

Fx(\)  =  fsz{V))d(a, 

-it 

where  Sx  is  called  a  s.d.f.  for  the  process.  5X  is  necessarily  non-negative  on  [-it.it];  even; 
and  unique  up  to  sets  of  Lebesgue  measure  zero  since,  if  Sx  differs  only  on  a  set  of  measure 
zero  from  Sx,  Sx  is  also  a  s.d.f.  for  fa}.  Moreover, 

Cx(x)  =  f eimSx(o\)d(j}=  2jJl coscot  Sx(co)rfco . 

-X  0 

If  S  is  to  be  a  s.d.f.  for  some  stationary  process,  it  follows  from  the  Weiner-Khinchin 
theorem  that  it  is  necessary  and  sufficient  that  S  be  even,  non-negative  on  [-it, it]  a.e.,  positive 
on  a  set  of  positive  measure  and  finitely  integrable. 

Definition  2:  A  wide-sense  stationary  process  fa}  with  s.d.f.  Sx  is  called  a  l.m.p.  in  the  s.d.f. 
sense  if  the  spectral  ratio 

ess  sup  S',  (co) 
ess  inf  5X  (co) 
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Both  definitions  1  and  2  are  essentially  due  to  Parzen[19];  however,  he  does  not  discuss 
the  relationship  between  his  covariance  and  s.d.f.  characterizations,  so  the  following  lemma 
is  of  some  interest 

Lemma  3:  A  l.m.p.  in  the  covariance  sense  need  not  be  a  l.m.p.  in  the  s.d.f.  sense.  The 
converse  is  also  true. 

Proof:  Let  Sx(co)  =  1+  /  |<a| ),  where  IA  is  the  indicator  function  for  the  set  A  and 

0  <  b  <  n.  Then  the  spectral  ratio  is  2,  but  Cx(x)  =  (2sinxb)/x,  so  £  |  Cx  (x)  |  =°®.  Conversely, 

let  Sx(co)  =  4sin2-y-  Then  the  spectral  ratio  is  infinite,  but  for  x£2  Cz(x)  =  0  so 

IIC,(x)|  □ 

Of  course,  there  exist  processes  that  are  l.m.p.’s  in  both  senses  and  also  processes  that 
are  not  l.m.p.’s  in  either  sense.  The  fractional  difference  processes  that  are  described  below 
are  examples  of  the  former;  stationary  and  invertible  ARMA  processes  with  a  finite  number 
of  parameters  are  examples  of  the  latter. 

The  proof  of  lemma  3  suggests  that  definition  2  is  not  a  very  good  one  for  our  purposes: 
a  process  with  a  s.d .f.  of  4sin2-^-  is  just  the  first  difference  of  a  white  noise  process  and 

mt 

intuitively  should  not  qualify  as  a  l.m.p.  An  examination  of  the  time  series  that  are 
mentioned  in  section  1  suggests  a  narrowing  of  the  definition  of  a  l.m.p.  in  the  s.d.f.  sense. 
Estimates  of  the  s.d.f.  of  those  time  series  show  a  concentration  of  power  at  low  Fourier 
frequencies  with  a  smooth  tapering  off  toward  higher  frequencies.  Accordingly,  we  make 
the  following 

Definition  4:  A  wide-sense  stationary  process  {x,}  with  s.d.f.  Sx  is  called  a  l.m.p.  in  the 
restricted.  s.d.f.  sense  if  Sx  satisfies  two  conditions:  1)  it  is  bounded  above  on  [8, re]  for  every 
5  >  0  and  2)  the  limit  Sx( Of)  =  «>. 

We  first  need  to  show  that  the  class  of  l.ntp.’s  in  the  restricted  s.d.f.  sense  is  not  empty. 
Lemma  5:  Let  the  function  S  be  integrable,  even,  non-negative,  and  satisfy  the  two 
restrictions  on  a  s.d.f.  of  a  l.m.p.  in  the  restricted  s.dLf.  sense.  Suppose  that  S  is  such  that 

lira  -^21 «  h  (6) 

®-*o  |eo|a 

for  some  0  >  a  >  -1  and  0  <  h  <  ®».  Then  S  is  in  fact  a  s.d.f.  for  some  stationary  process  {xj. 

Proof:  It  follows  from  the  Wiener-Khinchin  theorem  that  we  need  only  show  that  S 
integrates  finitely  over  Hr, it]  to  a  positive  value.  Since  S  (Of)  =  S  (0-)  =  «*,  there  exists  a  5  >  0 
such  that,  if  |  co |  <5,  S(co)  £  1.  Then 

/s(co)dco£  fs((a)  40)2:25, 

-K  -8 

so  the  integral  of  S  is  positive.  To  see  that  it  integrates  finitely,  pick  e  >  0  and  find  a  8  >  0 
such  that 

1 5  (co)  —  A  |  oo  |  *  |  <  e  |  co  1  * 

when  |  co  |  <5.  We  have 

fs  (co)  doaS2j*  |S(toH»coai  +/i<oadfi> 


£  (2£+4A)-j-^-  +  2 (rc-8)B  <  ~ , 


I 


m 
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where  B  a  sup  5  (co).  □ 

0}  €  (6,«I 

Obviously  a  l.m.p.  in  the  restricted  s.d.f.  sense  is  also  a  l.m.p.  in  the  s.d.f.  sense.  The 
relationship  between  the  restricted  s.d.f.  and  the  covariance  characterization  is  given  by 

Lemma  7:  A  l.m.p.  in  the  restricted  s.d.f.  sense  is  also  a  l.m.p.  in  the  covariance  sense. 

Proof:  Let  Cx  and  Sx  be  the  a.c.f.  and  a  s.d.f.,  respectively,  for  {x,},  a  l.m.p.  in  the  restricted 
s.d.f.  sense.  We  use  the  same  conventions  for  Fourier  series  coefficients  as  Titchmarch[24] 
in  what  follows.  Since  Cx(x)/n  is  simply  the  t-th  Fourier  coefficient  of  Sx, 

s  •j-{Cx(0)  +  licx(x)} 

t-i 

is  the  Fourier  series  of  Sx  evaluated  at  co  =  0.  Let 

s*-~{Cx(0)  +  2ZCx(x)} 

2K  T-l 

be  the  n  -th  partial  sum  associated  with  s .  Now  the  n  -th  arithmetic  mean  or  Ceskro  sum  of 
{sj  is  defined  by 


1  *- o 


Sx(co)da ) 


by  a  standard  argument  in  Fourier  analysis  (see,  for  example,  p.412  of  Titchmarch[24]).  We 
now  use  part  of  Fejfer’s  theorem  in  the  form  given  by  p.89  of  Zygmund[27]:  since  the  limits 
Sx(0±)  exist  and  are  both  «,  a„  <*•.  Next  define 

,;  =  -L(c,(°)+2£ic,(t)i) 


and 


1  *-l 

*  1  v-*  + 

u  =  • 


'  *-0 


Since  s*2sn,  o„*  2  o*  and  a,*  «  also.  Assume  for  the  moment  that  s*  does  not  diverge  to 

«.  Since  s*  is  a  non-decreasing  sequence,  it  has  a  finite  limit,  say,  s*.  Pick  N  so  large  that 
I  s*  -  s  *  |  <  e  for  all  n  2  N.  For  n  >  N 


-~{Nj:\s:-s*  i  +  "£  I*;-**  us-^+e, 


where  K  is  a  constant  independent  of  n.  Thus  o*  converges  to  s*  also,  which  is  a 
contradiction.  We  conclude  that  j„*  -» <~  which  readily  implies  the  lemma.  □ 

The  next  lemma  concludes  our  discussion  on  the  relationship  among  s.d.f.,  restricted 
s.d.f.  and  covariance  sense  l.m.p. ’s. 

Lemma  8:  There  exist  processes  that  are  l.m.p.’s  in  both  the  s.d.f.  and  covariance  sense  but 
not  in  the  restricted  s.d.f.  sense. 

Proof:  Let  Sx  be  a  s.d.f.  for  a  l.m.p.  in  the  restricted  s.d.f.  sense,  and  define 
Sj,(to)  =  Sx(n-| ©I ).  Sy  is  clearly  not  a  s.d.f.  for  any  l.m.p.  in  the  restricted  s.d.f.  sense,  but  is 
a  s.d.f.  for  some  l.m.p.  {y,}  in  the  s.d.f.  sense.  It  follows  easily  that 


L..Ak>,.k>U 


j'.  n~.  V-  AVCli 


Cy(t)  =  (-l)tCI(T). 

Lemma  7  shows  that  £  |  Cx(t)  |  =  ~,  so  £  |  Cy(x)  |  =  «  also  and  the  lemma  follows.  □ 

Figure  1  summarizes  lemmas  3,  7,  and  8  schematically.  Each  point  in  this  figure 
represents  a  l.m.p.  in  whatever  senses  the  loops  that  surround  it  are  labelled. 


covariance 


restricted  s.d.f. 


s.d.f. 


The  rate  at  which  the  s.d.f.  for  a  l.m.p.  in  the  restricted  s.d.f.  sense  diverges  to  infinity  as 
|  co  |  -»  0  is  important  in  some  of  the  results  to  follow.  Equation  (6)  gives  one  particular  rate, 
namely  that  S  is  approximately  proportional  to  |  to  |  “  near  the  origin. 

Definition  9:  The  function  S  is  said  to  obey  a  power  law  in  the  limit  (as  co  -»  0)  of  order  a, 
where  a  is  any  real  number,  if 


lim 

o»-»0 


to 


for  some  0  <  h  <  <». 

Definition  10:  A  wide-sense  stationary  process  {x,}  with  a  s.d.f.  Sx  is  called  a  power  law 
l.m.p.  of  order  a,  where  -1  <  a  <  0,  if  Sx  1)  obeys  a  power  law  in  the  limit  of  order  a  and  2)  is 
bounded  above  on  every  interval  [8,  it]  with  8  >  0. 

It  is  obvious  that  a  power  law  l.m.p.  is  just  a  special  type  of  l.m.p.  in  the  restricted  s.d.f. 
sense.  The  basis  for  definition  10  is  that  empirical  s.d.f. ’s  for  the  time  series  mentioned  in 
section  1  often  seem  to  have  a  linear  appearance  near  zero  Fourier  frequency  when  they  are 
plotted  on  a  log-log  scale.  The  slope  of  the  line  is  given  by  the  exponent  a  For  example, 
Mohr[16]  finds  that  a  =  -.67  gives  a  good  fit  for  data  on  the  yearly  minimum  water  levels  for 
the  Nile  River. 

Let  us  now  consider  an  example  of  a  power  law  l.m.p.  that  is  analytically  tractable. 
Definition  11:  A  wide-sense  stationary  process  (x,}  is  called  a  fractional  difference  process 
(f.d.p.)  of  order  a  >  -1  if  it  has  a  s.d.f.  given  by 


S,(co)  =  A(4sin2y)2 


(12) 


for  some  0<  h  < «.  (That  Sz  obeys  a  power  law  in  the  limit  of  order  a  follows  from  an 
application  of  l’Hospital’s  rule.) 


The  name  "fractional  difference"  is  derived  from  the  following  considerations.  Suppose 
that  {y, }  is  a  white  noise  process  with  a  s.d.f.  5>  (co)  =  h .  Let  {z,}  represent  the  process  formed 
by  taking  the  n-th  finite  differenc  of  { y ,}.  Then  { z ,}  has  a  s.d.f.  given  by 

St  (co)  =  h  (4sin^y  )n  . 

Thus  Sx  in  equation  (12)  may  be  regarded  as  a  s.d.f.  for  a  "fractionally  differenced"  white 
noise  process.  (The  order  of  the  f.d.p.  in  definition  11  is  defined  as  a  instead  of  ot/2  to 
remind  us  that  Sx  (to)  =  h  |  co  | a  for  co  close  to  0.) 

This  model  is  due  to  Granger  and  Joyeux[10],  to  whom  the  reader  is  referred  for  more 
details.  They  show  that  the  a.c.f.  of  this  process  is  given  by 


r(t--y)  T(X-y) 

Cx(x)  =  -2Asinyy  r(l+a) - - —  *ha - - — 

r(x+i+-|>  r(x+i+y) 


for  x  £  0  when  -1  <  a  <  2  and  a  *  0.  We  note  that  it  is  easy  to  compute  Cx  step  by  step  since 


C,(x)  =  Cx(x-l) 


2x-q-2 

2x+a 


(14) 


for  x  =  1, 2,  •  •  • .  It  is  sometimes  useful  to  have  a  good  approximation  to  Cx,  so  the  following 
lemma  gives  an  error  analysis  for  an  approximation  due  to  Granger  and  Joyeux. 
ha  1 

Lemma  15:  Cx(x)  =  (1+0  (— ))  for  x  £  1;  moreover,  if  -1  <  a  <  0,  the  bounding  constant 

X  X 

for  the  O  term  may  be  taken  as  .35. 

Proof:  The  proof  of  this  lemma  is  messy  and  uninteresting.  The  interested  reader  is  referred 
to  Percival[21]  for  details.  □ 

An  immediate  corollary  is  that  £  |  Cx(x)  |  =  °°  if  -1  <  a  <  0  (as  must  be  true  by  lemma  7 
also)  and  £  |  Cx(x)  1  <  «*  if  a  £  0. 

A  f.d.p.  has  a  number  of  mathematical  advantages  when  a  specific  model  is  needed  for 
a  l.m.p.  since  its  s.d.f.  has  a  simple  mathematical  form  given  by  equation  (12).  In  contrast, 
the  discrete  fractional  Gaussian  process  (d.f.g.p.)  that  Mandelbrot[15]  has  introduced  does 
not  have  this  property.  A  d.f.g.p.  {x,}  is  defined  as  a  stationary  Gaussian  process  that  has  a 
a.c.f.  given  by 

C,(T)  =  -^-nt+l|1-a-2|xl1-<I+  lx-111^; 


for  |  x  |  >1,  Cx(0)  >  0,  and  -1  <  a  £  1.  For  large  positive  x, 

Cx(0)a(a-1) 


so  the  a.c.f.’s  for  a  d.f.g.p.  and  a  f.d.p.  have  approximately  the  same  structure  for  large  x.  It 
is  also  true  that  a  d.f.g.p.  has  a  s.d.f.  that  obeys  a  power  law  in  the  limit  of  order  a,  but 
unfortunately  this  s.d.f.  cannot  be  expressed  in  a  closed  form.  Moreover,  the  simple,  yet 
accurate,  approximation  to  the  exact  a.c.f.  of  a  f.d.p.  given  by  lemma  1.23  is  often  easier  to 
work  with  than  the  rather  cumbersome  exact  expression  for  the  a.c.f.  of  a  d.f.g.p.  For  this 
reason,  we  prefer  f.d.p. ’s  to  d.f.g.p.’s. 


1.3  The  Sample  Mean  of  a  Power  Law  L.M.P. 

Let  {x,}  represent  a  power  law  l.m.p.  of  order  -1  <  a<  0  with  s.d.f.  Sx.  Let  Ex,  =  pz. 
Suppose  that  we  are  given  a  finite  portion  of  a  realization  of  this  process,  say,  x  lt  x2,  •  ,  xn , 

and  that  we  want  to  estimate  px.  The  natural  estimator  to  consider  is  the  sample  mean 


Strong  consistency  of  this  estimator  is  shown  in  lemma  17,  for  which  we  need  the 
following  simple  result: 

Lemma  16:  Let  {x,}  be  a  power  law  l.m.p.  of  order  -1  <a<0  with  s.d.f.  Sx.  Then  1) 
5,(oj)  =  0(|co|a)  and  2)  Sx(CD)-/t|to|a  =  o(|co|“)  as  co->  0. 

Proof:  Suppose  first  that  -1  <  a  <  T  Pick  any  e  >  0.  There  exists  a  5  e  (0,  jr)  such  that 

|Sx(co)-/icoa|  <  ecoa 

for  all  0  <  to  <  5,  which  yields  statement  2)  of  the  lemma.  The  above  implies  that 

Sx(co)  <  (h  +  e)co“ 

for  0  <  to  <  8.  Moreover,  since  Sx  is  bounded  above  for  co  e  [8,  ji],  it  follows  that 

Sx  (co)  SAT  |  to  | a 

on  [-re.re]  for  some  finite  constant  K  and  hence  that  statement  1)  holds.  □ 

Lemma  17:  Let  {x,},  px,  and  xH  be  as  specified  above.  Then  xH  converges  to  p*  a.s. 

Proof:  We  use  the  following  result,  which  is  a  direct  consequence  of  corollary  3,  p.207,  of 

Hannan[l  1]:  if  {x ,}  is  a  wide-sense  stationary  process  with  s.d.f.  Sx  and  if 

,8 

j  Sx(aj)d(o  =  0(&) 
o 

as  8  -*  0  for  some  p  >  0,  then  xH  converges  a.s.  to  p*.  By  lemma  16,  Sz(co)  ^  K  cd“  for  co  >  0 
and  some  0  <  K  <  «».  Thus 

,8  fi  J ^■Rt+0 

J  Sx(C0)  d CO  S  K]  CD®  d (D  =  —■ —  =  0(8^) 

0  0  1+<x 

forp=l+a>0as8-»0.  □ 

We  want  to  determine  at  what  rate  veu-  xH  decreases  to  0  as  n  <».  Unfortunately,  since 
Sx  is  discontinuous  at  0,  we  cannot  quote  the  standard  result  (theorem  6.12,  p.232,  of 
Fuller[8])  that 

2rcSx(0) 

var  x„  = - 

n 

for  large  n ,  which  in  fact  is  incorrect  The  next  lemma  shows  that  the  rate  of  decrease  is 
related  to  the  exponent  a. 

Lemma  18:  Let  {x,}  be  as  in  lemma  17.  Then 

Proof:  By  lemma  16, 5x(<n)  £  K\ co|  °  on  [-ji.tc]  for  some  constant  K.  By  theorem  8.23,  p.444, 
of  Anderson[3],  we  have 
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var  xn 


n  * 


sin 


n  co 


_  ■  203 
~n  n  sin  — 


Sx(co)d  co 


S2AT 


.  2  n  CD 
sm  2 


I  n  sm  — 
«  2 


2a: 


(l-KX)n 


l+a 


IK  r» 
_  l+a  J 


.  2  n(0 
Sm  2 


n  *T“  V  •  2  03 
0  «sinz— 
2 


da> 


dco 


2/:n^(i-Kx); 

(l+a)n 1+0 


where  we  have  made  use  of  the  following  properties  of  the  Fejer  kernel: 


sm‘ 


n  0) 


nsm 


2_co 


<>n 


(equation  (36),  p.465  of  Anderson(3])  and 


f 


n  sm‘ 


d(0  =  K 


(equations  (9)  and  (11),  p.461  of  Anderson[3]).  □ 

Let  us  now  specialize  to  the  case  where  {x,}  is  a  f.d.p.  of  order  -l  <  a  <  0.  The  s.d.f.  Sx 
and  a.c.f.  Cz  for  fx,}  are  given  by  equations  (12)  and  (13)  respectively.  We  now  give  a  more 
precise  rate  of  decrease  for  var  xH  in  the  following  lemma.  For  use  below  the  lemma 
actually  gives  the  rate  of  decrease  of  the  variance  of  a  k  -decimated  mean,  defined  by 


1  m 

X*  )  =  "  2***  ’ 
m  1-1 

where  m  is  the  largest  integer  that  does  not  exceed  n/k.  A  k  -decimated  mean  thus  uses  only 
every  *-th  data  value.  Since  x,(l)  =  x,,  the  case  k  =  1  gives  the  rate  of  decrease  for  var  xn. 

2h  n  I 

Lemma  19:  var  xH(k)  =  — - +  0(— )  for  a  f.d.p.  with  -1  <  a <  0;  moreover,  the  O 

n  l+<xa(a-l)  n 

term  is  less  than 


cxm—  +  — ” — 2 

n  (l+a)n2 


in  magnitude,  where  A  and  B  are  constants  independent  of  a. 

Proof:  The  a.c.f.  for  fa,;  is  just  Cx(kx),  x  =  0, 1,  •  •  • .  By  a  simple  extension  of  a  well-known 
result  (see,  for  example,  the  proof  of  corollary  6. 1.1. 2,  p.232,  of  Fuller[8]), 
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var  *„(*)  =  —  fCx(0)  +  2 £(1  -  —  )Cx(£t);  . 
m  ~  « 


By  lemma  15,  the  summation  above  is  equal  to 


^  a 

L  1  -HI 


T*1 


m  x 


x»i 


)} 


(20) 


(21) 


Define 


/(0  =  (I-~  )fp- 


m 


A  simple  argument  shows  that  the  values  that  any  given  derivative  of  /  assumes  on  the 
interval  (1,  m)  all  have  the  same  sign  if  p  <  0.  In  this  case  we  may  use  the  Euler-Maclaurin 
summation  formula  (see,  for  example,  section  5.8  of  Hildebrand[12]) 


£/  (t)  =  f  /  (f )  dt  +  \(f  (1)  +/  (m +  £(m  ;p) 

T*1  1  Z 


and  know  that 


£(m;p)£  — |/'(m)-/'(l)|  , 

where  /'  denotes  the  first  derivative  of  /;  and  £(m;P)  has  the  same  sign  as  the  quantity 
inside  the  absolute  value  signs.  Computations  show  that,  if  p  *  -1  or  -2, 

m  1  111 
T»1 


(P+l)(P+2)  P+1  (P+2  )m  2  2m 


where 


If  we  let  P  =  -1-a,  then  -1  <  P  <  0  and  we  may  use  the  above  to  show  that  the  first  term  in  the 
brackets  in  (21)  is  equal  to 

■■*1.1  11 


m 


+  —  +  ■ 


+  ~  —  — —  +  £  (m  ;-l — ct) , 
a(a-l)  a  (l-a)m  2  2m 

with  £(m;-l-a)  =  0(l).  By  the  second  part  of  lemma  15,  the  second  term  in  the  brackets, 
which  we  denote  as  R  (m ),  is  such  that 

.35  "  x  1 

\R{m)\  <-7 rEo-— Hb 

k2  nt  x3^ 

If  we  now  let  p  =  -3-a,  then  —3  <  P  <  —2  and  it  follows  from  the  above  that 

35  i  ii 

\R(m)\  < 


k1  m2+<I(l+a)(2+a)  2+a  m(l-Kx) 

+  - — +  £  (m 3— Ct)  ^  . 

2  2m 

Since  £(m;-3-a)  =  0(l),  £(m)  =  0(1)  also.  By  combining  the  above  results  we  have 

1  1 


Cx(  0) 

var  xn(k)  =  — +2 h a{- 


m 


■  [— - -) 

(m*)1+a0t<a-l)  m2k 1+a  1-a  2 


(22) 


+ - r—[— +—+E  (m  ;-l  -a)+R  (m)]} 

mk  1  O-  2 

2h*  1 

=  - : - +0(  — ) 

(mk  )‘*aa(a+l )  rn 

Since  mk  =  n  +  0(  l),  an  additional  easy  argument  yields  the  first  part  of  the  lemma. 

To  simplify  notation,  suppose  k  =  l  and  let  E  a  E(n  ;-l-a)  and  R  =  R(n).  Equation  (13) 
and  the  above  show  that  the  absolute  value  of  the  O  term  in  the  statement  of  the  first  part  of 
this  lemma  is  equal  to 


i  . 

c-'o> 


1  1  ,  1  r  1  1  „  , 

L>  I 

l-a  2  n  a  2 


1  ar(1+T}  1 

*Cxm - (— +  1+  |E|  +  \R\}  ■ 

n  a.  a 


nT(l-y) 


Since  £ :  <  1  and 


\R\  <  C  + 


B 


(l-KX)rt 


for  finite  constants  B  and  C  independent  of  a,  it  follows  that  the  O  term  is  bounded  in 
absolute  value  by 


C'M~  +  B—J 


n  (l+a)n2J 

for  some  finite  constant  A  independent  of  a.  A  similar  argument  holds  when  k  >  2.  □ 

The  most  interesting  aspect  of  this  lemma  is  that  the  leading  term  for  var  xn(k)  is 
independent  of  k .  We  now  make  use  of  this  observation  to  show  an  interesting  property  of  a 
f  d  p.  Let 

var  x. 

e(k\n) 


var  xH(k) 

be  the  relative  efficiency  and,  if  the  limit  exists, 


(23) 


e(Jfc)  ■  lim  e(k\n) 

be  the  asymptotic  relative  efficiency  (a.r.e.)  of  3en(>k )  with  respect  to  xn. 

Corollary  24:  For  a  f.d.p.  with  -1  <  a  <  0,  e{k)=  1  for  all  k. 

Proof:  This  follows  immediately  from  lemma  19.  □ 

Loosely  speaking,  corollary  24  says  that  we  can  discard  an  arbitrarily  large  fixed 
proportion  of  observations  from  a  f.d.p.  of  order  -1  <  a<  0  and  still  produce  an  estimate  of 
p,  that  is  as  efficient  asymptotically  as  the  sample  mean  of  all  the  observations.  This  result 
indicates  how  "strong"  the  memory  can  be  in  a  l.m.p.  For  contrast,  note  that,  if  {xt}  were  an 
uncorrelated  process,  e(k)=  1  sk. 
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As  a  numerical  example  of  the  above  results,  table  1  gives  the  exact  relative 
efficiencies,  e(k,n)  in  equation  (23),  for  a  f.d.p.  with  a  =  -.5  and  -.9  for  various  values  of  n 
and  k.  For  example,  for  n  =500,  a  =  -.9,  and  k  =50 ,xH(k)  and  J,  are  averages  of  10  and  500 
data  values,  respectively,  and  yet  e  ( k  \n )  =  .94  indicates  that  xH ( k )  is  almost  as  efficient  as  xn . 
For  the  sake  of  comparison,  the  column  marked  l/k  is  the  value  of  e{k  ;n  )  for  an  uncorrelated 
process. 


Table  1 

Relative  Efficiency  of  .*„(*)  with  Respect  to  xn 


k 

1  Ik 

a  = 

e(*;100) 

-.5 

«(*;500) 

a  = 

<r(*;100) 

-.9 

e(Jk^00) 

2 

.5000 

.9375 

.9711 

.9975 

.9994 

5 

.2000 

.7627 

.8782 

.9870 

.9970 

10 

.1000 

.5659 

.7453 

.9653 

.9918 

20 

.0500 

.3649 

.5636 

.9173 

.9796 

50 

.0200 

.1721 

.3186 

.7793 

.9392 

Theorem  28  below  shows  that  corollary  24  is  true  for  a  wider  class  of  power  law 
l.m.p.’s  than  just  f.d.p.’s  of  order  -1  <  a  <  0.  Before  we  can  prove  that  theorem,  we  need  the 
following  result 

Lemma  25:  Suppose  that  {y,}  is  a  power  law  l.m.p.  of  order  -1  <  a  <  0  with  s.d.f.  Sy  such 
'IttI 

that  the  limits  Sy(— 7-  ±)  exist  for  /  =  0,  •  •  • ,  k- 1  with  k^2.  Then 
7  k 

*-1  2  id  2  id 

lim  n {var  y„(k)-  var  yj  =  Jt £ {Sy(—+)  +  Sy(— — )}  . 

n-*-  *  * 

Proof:  The  k -decimated  mean  yH[k)  is  the  sample  average  of  y*,y2Jk.  •  •  •  ,ymk,  which  we 
may  regard  as  a  sample  of  a  subsequence  {y,iJ  of  the  original  process  {y,}.  This  subsequence 
has  a  s.d.f.  that  is  given  by  the  folding  theorem  (see,  for  example,  p.388  of  Anderson[3]): 

„  ,oh-2jc/, 

5*(©)=  ,  l  )»  (26) 

K  ia )  K 

where  Sy  is  defined  outside  of  Hr,  jt]  by  periodic  extension.  The  a.c.f.  that  corresponds  to 
this  subsequence  is  simply  Ck(x)  =  Cy(k%).  In  this  notation,  equation  (20)  becomes 

var  y,(*)  =  — fC*( 0)  +  2^(1  -  —)Ck(X)} 
m  ,  m 


S\Tf 


ifflO) 


Sk((o)d  a. 


(See,  for  example,  p.89  of  Zygmund[27].)  Define  r  =  —  -m  and  note  that  0  <,  r  <  1.  We  then 
have 
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n{var  y„(k)  —  var  yH) 


sin  — “ — 

=  “T  f - TTSk(Ui)  d<a  - -f 


m  ,  2(0 
*  sim— 


n  \  .  20> 
*  simy 


-S^COjdCO 


=— f 


-x  2m 

*  sim— 


— /(co)<fco  +  £/?,  . 
“  1-1 
2 


where 


jtu 

/  (CO)  =  2jtfiS*(co) - ~Sy( ~)}  , 

‘W2t 

.  2  mCO 
^  sur-r- 

“*  sim— 

2 

._2mk(0 
_  sim—— 

"*  sim— 

2 

■  ,  ,  .. .  rjfcCO,  .  r*C0 
,  _  sm(m*0H— —  )sin— — 

* » -  s«-^>£ — ”7® — -s><m)  "•  • 


.  j  mi:  CO 

sim — - — 
2 


mikJ  .  2  CO 
i  sim— 
*  2 


Sy((0)d(0  . 


Fix  e  >  0.  Then 


n  ,  ^  kfi  co2  „  ,  %  J  .  2/fc  (*  $*(<*>)  , 

R 1 1  S—  I - S*(co)dco  + — r  - d co<e 

z «» ffl  m  *  m 


0  sin2^- 


"*"•  sin2^ 


by  first  choosing  5  so  small  that  the  first  intergal  is  less  than  e/2  and  then  choosing  m  so  large 
that  the  same  is  true  for  the  second  integral.  Thus  R  i  ->  0  as  n  — ►  «•.  By  similar  arguments  it 
follows  that  rt2»  R  3»  and  R4  also  converge  to  0  as  «  -»«•.  By  Fejir’s  theorem  (p.89  of 
Zygmund[27]),  as  n  -*  «•  the  integral  in  equation  (27)  converges  to 


where 


L  -yf/(0*>+/(0-);-L,+L2, 


.  *z}  ,  2k/  .  -  .  2kI  . , 

Ll  =  K'£{Sy(—+)  +  Sy(— — )} 

/-i  *  * 


’V*< 


t 

\ 


L2  =  lim  2n{Sy(— )[1 

CD — »0  K 


*2sin2— 


2* 


U 


from  the  fact  that  Sy  is  an  even  function.  Since  {y,}  is  a  power  law  Sy(co)  =  0  ( |  to  |  “) 
as  co  —>  0  by  lemma  16.  It  is  easy  to  show  that 


*2sin2 


co 

2k 


=  1+0  (CO2)  , 


from  which  we  may  deduce  that 

L2=  lim  0(|co|“) 0(^  =  0 . 
®-»o 


The  lemma  now  follows.  □ 

Theorem  28:  Suppose  that  {y,}  is  a  power  law  l.m.p.  of  order  -1  <  a  <  0  with  s.d.f.  Sy  such 
that  the  limits  Sy(co+)  and  5y(co-)  exist  for  all  ©  e  (-it,  it).  Then  e (k)  =  1  for  all  k . 

Proof:  By  lemma  25,  for  any  k 


n{var  yn(k)-var  yH}  =  Lx  +  o{  1) 


We  may  rewrite  the  above  as 

varyH(k)  L\ 

vary,,  nvar  yn 


+  *( - 

n  var  yn 


Since 


nvar 


.  2  n(S) 

-  1  rmf~r 

sinz— 
2 


and  Sy(0+)  =  Sy(0-)  = <*»,  by  the  version  of  Fejfcr’s  theorem  given  by  Zygmund[27], 
n  var  yn  — » <*>.  Hence 


1 

e(k) 


varyn(k) 

lim - — —  =  1 

*  -*«•  var  yH 


as  claimed.  □ 

The  contrapositive  of  the  next  lemma  shows  that  the  conclusion  of  theorem  28  can  only 
hold  for  stationary  processes  that  are  l.m.p.’s  in  the  covariance  sense. 

Lemma  29:  Let  {y,}  be  a  stationary  process  with  a.c.f.  Cy.  If  £  I  Cy(x)  |  <  <*>,  there  exists  a  k0 
such  thate(*)<  1  for  all*  2*o- 

Proof:  If  £|Cy(x)|  <oo»  then  (yt)  has  a  continuous  s.d.f.  Sy  by  theorem  3.1.9,  p.110,  of 
Fuller[8].  Likewise,  the  * -decimated  mean  has  a  continuous  s.d.f.  S*  given  by  equation  (26). 
Since  both  Sy  and  Sk  are  continuous,  it  follows  from  theorem  6.1.2,  p.232,  of  Fuller[8]  that  as 


9. 


g 

At 
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2itSJ0) 

var  yH  - - +  <?(!) 


var  yn(k )  = 


2**S*(0) 


+  o(l). 


Since  Sy  is  continuous  and  strictly  positive  on  at  least  one  interval  of  non-zero  length,  it 
follows  that 

*-i  2kI 

w*<0)=5;sy(~)>sy(0) 

* 

for  all  k  £  *o  for  some  positive  integer  k^.  Thus 

Sy(0) 

for  all  k  >  /feo  and  the  lemma  is  proved.  □ 

In  the  case  where  the  a.r.e.  between  two  competing  estimators  for  a  quantity  is  1, 
Hodges  and  Lehmann[13]  propose  the  use  of  deficiency  as  a  useful  criterion  for 
distinguishing  between  the  estimators.  Given  x,,  •  •  •  ,x„,  consider  the  estimator  x„  of  p* 
with  variance  var  xH .  If  it  exists,  let  d^d  (k  ,n )  be  the  smallest  integer  such  that 

var  xHMi(Jc  )ZvarxH<var  xs+rf_,(i ) .  (30) 

Then  d(k;n)  is  called  the  deficiency  of  x„(*)  with  respect  to  x„.  Typically  it  represents  the 
number  of  additional  observations  needed  by  a  it -decimated  mean  to  perform  approximately 
as  well  as  the  sample  mean  with  n  observations.  If  it  exists, 

d(k)  *  lim  d(k;rt) 

is  called  the  asymptotic  deficiency. 

Theorem  31:  For  a  f.d.p.  with  -1  <  a  <  0,  d(k )  =  ~  for  all  k  £  2. 

Proof:  The  first  portion  of  our  argument  is  a  slight  modification  of  one  due  to  Hodges  and 
Lehmann[13].  It  is  shown  in  corollary  38  below  that  var  xH  is  a  strictly  decreasing  function 
of  n.  It  follows  from  the  proof  of  that  same  corollary  that  var  xH(k)  is  a  decreasing  function 
of  n  for  n  2:  k .  Since  var  x\  -  var  xk(k);  since  var  xH  >  0  for  all  n ;  and  since  both  var  xn  and 
var  xn(k)  decrease  to  0  as  n  -*  by  lemma  19,  it  follows  that  for  n  £  k  there  is  a  unique 
integer  d  =d(k;n)  such  that  equation  (30)  holds.  Since  var  x*  -+  0  as  n  -» «*>,  it  follows  from 
that  equation  that  var  xnHt(k)  -*  0  also.  By  lemma  19, 


var  xnHt(k)s 


r*  /  *  \ 

(rt+d)1+a  +!TZd+0{'7+d) 

where  c  =2h  ^{aia-l)}  and  {b„}  is  such  that  \bH  |  <  b  <  -  for  all  n  and  for  some  constant  b . 
Thus  n  +d  -+  «>.  Lemma  19  further  shows  that 


var  x,  = 


c  a.  1 


for  {aj  such  that  |  a„  |  <  a  <  <»  for  all  n  and  some  constant  a .  By  equation  (30)  we  have 

C  1  C  .  aH  .  ,  1  , 

- + - -  +  £»( - -)£  - +0  ( ) 

(n+d)l+°  n+d  n+d  a1-**  «  « 
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+  o(- 


(«+<f-l)1'K‘  n+d-l  n+d-l 
Since  n+d  -+ « it  follows  that 


) 


C  an  .  1  .  _ 

— +  — ^n+d^  '  n+d  '  ~'n+d 


or 


1  ,  an+°(.  1),  1  ,  .  K+o(\) 

(C  + - — i  =  — ~ +----}■ 


The  above  indicates  that  (n+d)/n  — *  1  as  n  — »°°  and  hence  that  din  =  <?(1).  We  may  now 
write 


1  +  —  =  [c+- 
n 

=  [1- 

=  1+- 


]  1+a[c4- 


« 

a„+<?(l) 

(l+ajcn-* 
b„-aH+o(l) 


(n+d)~ 


bH+o(  1) 

■Hit  -  -] 

(l+a)cn^ 


(1  +U)cn' 


Thus  we  have 


d=n1^ 


bH-On+o(l) 


(1  +«)c 

The  lemma  follows  if  we  can  show  that  lim  inf  (bH-aH)  >  0  for  all  k  2: 2.  By  equations  (32) 
and  (33),  this  is  the  same  as  showing  that 

lim  inf  n  {var  xn{k)-var  x„}  >0  .  (34) 

It  follows  from  lemma  25  that 

*-i  2id 

lim  n(var  xH(k)  —  var  x„}  =  2n^tSx(~~r~)  >  0  . 

*-*"  /-i  * 


Thus  condition  (34)  holds  and  the  lemma  follows.  □ 

Table  2  shows  the  exact  deficiencies  for  the  same  values  of  n,  a,  and  k  as  are  used  in 
table  1.  The  number  in  parentheses  below  each  value  of  d(k;n)  represents  the  total  number 
of  data  points  that  the  k  -decimated  mean  utilizes.  This  number  is  simply  {n  +d(k;n)}/k;  the 
ratio  of  this  number  to  n  (the  number  of  data  points  that  the  sample  mean  uses)  approaches 
l/k  as  n  -4  00,  since  d(k;n)in  is  o(l)  by  the  proof  of  theorem  31.  For  example,  for  n  =500, 
a  =  -.9,  and  k  =50,  the  table  shows  that  300  additional  observations  are  required  for  a  50- 
decimated  mean  to  have  at  least  as  small  a  variance  as  that  of  an  ordinary  mean  of  500 
observations.  However,  a  50-decimated  mean  of  800  observations  actually  utilizes  only  16 
observations.  The  ratio  16/500  =  1/50  roughly  as  the  above  theory  suggests.  For  the  sake  of 
comparison,  the  column  that  is  labelled  dw(k-,  100)  contains  the  corresponding  deficiencies  for 
an  uncorrelated  (white  noise)  process. 

The  above  results  on  relative  efficiency  and  deficiency  have  some  practical 
consequences.  If  one  is  interested  in  estimating  p,  in  a  given  time  frame  from  a  finite 


Table  2 


Deficiency  of  xK(k)  with  Respect  to  xn 


a  = 

d(k; 100) 

-.5 

d(k;50Q) 

a  = 

d(k;l00) 

-.9 

<f(*;500) 

100 

14 

30 

4 

4 

(100) 

(57) 

(265) 

(52) 

(252) 

400 

60 

135 

15 

20 

(100) 

(32) 

(127) 

(23) 

(104) 

900 

130 

310 

40 

50 

(100) 

(23) 

(81) 

(14) 

(55) 

1900 

280 

660 

80 

100 

(100) 

(19) 

(58) 

(9) 

(30) 

4900 

700 

1600 

200 

300 

(100) 

(16) 

(42) 

(6) 

(16) 

sample  of  data  from  a  f.d.p.  and  each  observation  costs  a  certain  amount,  table  1  on  relative 
efficiencies  shows  that  it  is  possible  by  means  of  decimation  to  reduce  the  cost  considerably 
at  the  expense  of  a  modest  decrease  in  efficiency  if  a  is  close  enough  to  -1.  Conversely,  if 
only  a  fixed  number  of  observations  can  be  made  and  the  time  frame  is  flexible,  table  2  on 
deficiencies  suggest  that  the  data  be  spaced  as  far  apart  in  time  as  it  is  practical.  These 
comments  assume  that  the  variance  is  an  adequate  citerion  for  choosing  an  estimator. 

We  conclude  this  section  by  proving  a  lemma  concerning  the  variance  of  the  mean  of  a 
stationary  process,  a  corollary  of  which  we  used  above  in  theorem  31. 

Lemma  35:  Let  y  t,  y  2,  •  •  •  represent  a  wide  sense  stationary  process  with  a.c.f.  given  by  Cy 
such  that  C,(x)  <  (7,(0)  for  all  x  SI.  Let  y„  represent  the  sample  mean  of  the  first  n 
observations.  Then  var  yH  <  var  if  and  only  if 

■-*  Af,C,(0) 

Z  (t-M„)C,(t)  <  —  ,  (36) 


where 


n(n  - 1 

2/i  —  1 


Proof:  Without  loss  of  generality,  we  may  assume  for  convenience  that  Ey,  =  0.  Consider 
the  following  minimization  problem:  subject  to  («-l)a  +  P  ■  1,  find  a  (and  hence  (i)  such  that 


V(a)»vfl/-fa£y,  +0yj 

is  minimized.  Note  that  V(— )  =  vary*  and  V(— ^-)  =  var  y«-i-  It  is  clear  that 

/I 

V(a.)sAa?  +  B<x  +  C  for  some  A,B,  and  C  and  hence  that  V(a)  is  minimized  when 

B  - 

a  =  .  Since  V  is  quadratic  in  a,  an  easy  geometric  argument  shows  that  var  y„  is 

2A 


<,  =,  or  >  var  y„-i  as  a^/N  is  <,  =,  or  >  —  (—  +  — -).  Some  algebra  shows  that 

2  n  n  — 1 

A  =E(I \y,  ~(n-l)yn}2 
(-1 

=  E  {  (Zy,)2  -  2(n-l)£yty„  +  (n-lfy?  } 
i- i  /-l 

=  E(£y,)2  -  2(*-l)IC,(t)  +  (n-l)2Cy(0) 

i-l  x— 1 

and 

B  =  2E{  Y*yty*  -(n-l )y2} 

(-1 

=  2(ZCy(X)-(n-\)Cy(0)}  . 

x*l 

Some  algebra  shows  that 

E(Zy,)2  =  (n-l)Cy(0>+  2(n-l)£C,(t) - 2"£xCy(x) . 

/-I  t»l  D»1 

We  then  have 


aMlN 


(n-l)Cy(0)-  £Cy(x) 

t— i _ 

/«(n-l)Cy(0)-2£xCy(x) 

T»1 


If  we  substitute  this  equality  into 

2/i-l 
2n(n  -t>  * 

which  is  true  if  and  only  if  var  y,  <  var  y,_(,  then  inequality  (36)  follows  and  the  lemma  is 
proved.  □ 

Lemma  37:  If  Cy(0)  >  Cy(l)  >  •  •  >  Cy(n-1)  >  0,  then  inequality  (36)  holds. 

Proof:  We  use  an  inequality  of  Chebyshev:  if  a\  Sa2S  •  •  •  £a„  and  bt  Zb2Z  •  *  •  2:6, , 
then 


1**6*  £-(£**)(£**) 
*-i  t-i  *-i 


(See  3.2.7,  p.  1 1  of  Abramowitz  and  Stegun(l)).  We  now  have 


*-l 


I(t-A/,)C,(t)S 

T»l 


“VlZ^jxi^d)) 

t-l  x-l 


< 


Cy(0)£(x-M,) 


T-I 


Af.Cy(0) 

2 


and  the  lemma  holds.  □ 
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Corollary  38:  If  (x,}  is  a  f.d.p.  of  order  -1  <  a  <  0,  then  var  xn  is  a  strictly  decreasing 
sequence. 

Proof:  By  equation  (14) 


C,(t)  =  2^"Cx(t-D  <  Cx(t— 1) 


for  all  t  >  1 .  This  corollary  is  thus  an  immediate  consequence  of  lemma  37.  □ 

Let  us  note  in  passing  that  var  yn  £  var  need  not  hold  for  all  n  for  all  stationary 
processes.  Let  fy,}  be  a  second  order  autoregressive  process  with  =  0  and  |$2|  <  1.  (By 
pages  60-61  of  Box  and  Jenkins[4],  these  coefficients  do  yield  a  stationary  process.)  Now 
Pi  =  0  and  p2  =  <}>2,  and  it  is  easy  to  show  that  var  yj  >  var  y 2  if  p2  >  .75. 


1.4  The  Sample  Variance  of  a  F.D.P. 

Let  {x,}  represent  a  f.d.p.  of  order  -1  <  a  <  0  with  a.c.f.  Cx.  Let  us  assume  that  both 
Ex,  =  (i.  and  var  x,  =  Cx(0)  are  unknown  and  that  we  want  to  estimate  Cx(0)  from  a  data 
sample  A  natural  estimator  to  consider  is  the  sample  variance 


i,2  =  7Z(^i  -*„)2 


It  can  be  shown  (see  David[6])  that  for  any  stationary  process 

QZEs?SCx(0). 

The  following  lemma  shows  that  this  estimator  can  be  severely  biased  for  a  f.d.p. 

Theorem  39:  For  every  sample  size  N  and  every  e>0,  there  exists  a  f.d.p.  of  order 


-1  <  a  <  0  such  that 


Esn  <  eCx(0) 


for  all  n  <>  N . 

Proof:  Fix  e  >  0  and  N .  First,  note  that 


EsH2  =  ±ZE«xt-\i)-(xn-\l)}2 


=  C*(0)  -  var  xH 


By  the  use  of  equation  (13)  and  the  second  part  of  lemma  19,  the  above  may  be  rewritten  as 


r(1+f}  1  1 

— — — +  0(-)  +  0(^^), 


«1+<x(  l-a)r(l-y) 


where  the  bounds  for  the  O  terms  may  be  chosen  independently  of  a.  Let  a  =  -l  +  — .  We 


then  have 


r<-hb> 


+  0(-). 


-.WWW'  ■■■- -  > -  -  •.  -■  . 


Since  the  term  with  the  gamma  functions  in  the  above  equation  converges  to  1  as  n  -> it  is 
possible  to  find  a  n0  such  that  equation  (40)  holds  for  all  n  2  Let  n  i  be  the  greater  of  n0 

and  N  and  set  =  -1  +  — .  Then  equation  (40)  holds  when  n  =  n  t  for  a  f.d.p.  of  order  o^. 

ni 

We  need  only  show  that  it  also  holds  when  n  <  n  j  to  complete  the  lemma.  By  corollary  38, 
var  xn  is  a  strictly  decreasing  function  of  n,  which  means  that  Es 2  is  a  strictly  increasing 
function  of  n  because  of  equation  (41).  Hence  the  left  hand  side  of  equation  (40)  is  also  a 
strictly  increasing  function  of  n ,  which  shows  that  (40)  holds  for  all  n  <.  n  v  □ 

The  thrust  of  this  theorem  is  that  j*2  can  severely  underestimate  CX(Q)  even  for  very 
large  sample  sizes  if  a  is  close  to  -1,  a  fact  that  Allan[2]  noted  in  a  slightly  different  form  in 
his  1966  paper.  Table  3  gives  values  of  the  ratio  of  Esn2  to  Cx( 0)  for  various  a’s  and  sample 
sizes  n .  For  contrast,  the  column  that  is  marked  "white"  gives  the  corresponding  values  for  a 
white  noise  process. 


One  hope  for  correcting  this  bias  in  s2  is  to  estimate  a  by  some  means.  Mohr[16] 
utilizes  this  procedure  in  her  study  of  the  Nile  River  data  for  which  a  =  -.7.  Table  3  shows, 
however,  that  the  bias  depends  strongly  on  a  for  a  close  to  -1,  so  attempts  to  produce 
reasonable  estimates  of  C,(  0)  for  such  processes  by  a  bias  correction  of  s2  seem  hopeless. 
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